289 research outputs found

    Optimal-Hash Exact String Matching Algorithms

    Full text link
    String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing qq-grams. The improvement consists of considering minimal values qq such that each qq-grams of the pattern has a unique hash value. The new algorithms are fastest than algorithm of the HASH family for short patterns on large size alphabets.Comment: 14 page

    A fast implementation of the Boyer-Moore string matching algorithm

    Get PDF
    Manuscript, http://www-igm.univ-mlv.fr/~lecroq/articles/cl2008.pd

    Efficient Pattern Matching on Binary Strings

    Full text link
    The binary string matching problem consists in finding all the occurrences of a pattern in a text where both strings are built on a binary alphabet. This is an interesting problem in computer science, since binary data are omnipresent in telecom and computer network applications. Moreover the problem finds applications also in the field of image processing and in pattern matching on compressed texts. Recently it has been shown that adaptations of classical exact string matching algorithms are not very efficient on binary data. In this paper we present two efficient algorithms for the problem adapted to completely avoid any reference to bits allowing to process pattern and text byte by byte. Experimental results show that the new algorithms outperform existing solutions in most cases.Comment: 12 page

    Algorithms for Computing Abelian Periods of Words

    Full text link
    Constantinescu and Ilie (Bulletin EATCS 89, 167--170, 2006) introduced the notion of an \emph{Abelian period} of a word. A word of length nn over an alphabet of size σ\sigma can have Θ(n2)\Theta(n^{2}) distinct Abelian periods. The Brute-Force algorithm computes all the Abelian periods of a word in time O(n2×σ)O(n^2 \times \sigma) using O(n×σ)O(n \times \sigma) space. We present an off-line algorithm based on a \sel function having the same worst-case theoretical complexity as the Brute-Force one, but outperforming it in practice. We then present on-line algorithms that also enable to compute all the Abelian periods of all the prefixes of ww.Comment: Accepted for publication in Discrete Applied Mathematic

    Fast Computation of Abelian Runs

    Full text link
    Given a word ww and a Parikh vector P\mathcal{P}, an abelian run of period P\mathcal{P} in ww is a maximal occurrence of a substring of ww having abelian period P\mathcal{P}. Our main result is an online algorithm that, given a word ww of length nn over an alphabet of cardinality σ\sigma and a Parikh vector P\mathcal{P}, returns all the abelian runs of period P\mathcal{P} in ww in time O(n)O(n) and space O(σ+p)O(\sigma+p), where pp is the norm of P\mathcal{P}, i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm pp in ww in time O(np)O(np), for any given norm pp. Finally, we give an O(n2)O(n^2)-time offline randomized algorithm for computing all the abelian runs of ww. Its deterministic counterpart runs in O(n2logσ)O(n^2\log\sigma) time.Comment: To appear in Theoretical Computer Scienc

    A Note on Easy and Efficient Computation of Full Abelian Periods of a Word

    Get PDF
    Constantinescu and Ilie (Bulletin of the EATCS 89, 167-170, 2006) introduced the idea of an Abelian period with head and tail of a finite word. An Abelian period is called full if both the head and the tail are empty. We present a simple and easy-to-implement O(nloglogn)O(n\log\log n)-time algorithm for computing all the full Abelian periods of a word of length nn over a constant-size alphabet. Experiments show that our algorithm significantly outperforms the O(n)O(n) algorithm proposed by Kociumaka et al. (Proc. of STACS, 245-256, 2013) for the same problem.Comment: Accepted for publication in Discrete Applied Mathematic

    Efficient validation and construction of border arrays

    Get PDF
    In this article we present an on-line linear time and space algorithm to check if an integer array f is the border array of at least one string w built on a bounded or unbounded size alphabet Σ. We first show some relations between the border array of some string w and the skeleton of the DFA recognizing Σ ∗ · w, independently of the explicit knowledge of w. This enables us to design algorithms for validating and generating border arrays that outperform existing ones [4, 3]. The validating algorithm lowers the delay (time spent on one element of the array) from O(|w|) to O(min{|Σ|, |w|}) comparing to algorithms in [4, 3]. Finally we give some results on the numbers of distinct border arrays on some alphabet sizes.
    corecore